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ABSTRACT 

We introduce a Bayesian solution to the problem of inferring the density profile of 
strong gravitational lenses when the lens galaxy may contain multiple dark or faint 
substructures. The source and lens models are based on a superposition of an un¬ 
known number of non-negative basis functions (or “blobs”) whose form was chosen 
with speed as a primary criterion. The prior distribution for the blobs’ properties is 
specified hierarchically, so the mass function of substructures is a natural output of 
the method. We use reversible jump Markov Chain Monte Carlo (MCMC) within Dif¬ 
fusive Nested Sampling (DNS) to sample the posterior distribution and evaluate the 
marginal likelihood of the model, including the summation over the unknown number 
of blobs in the source and the lens. We demonstrate the method on two simulated data 
sets: one with a single substructure, and one with ten. We also apply the method to 
the g-band image of the “Cosmic Horseshoe” system, and find evidence for more than 
zero substructures. However, these have large spatial extent and probably only point 
to misspecifications in the model (such as the shape of the smooth lens component or 
the point spread function), which are difficult to guard against in full generality. 

Key words: gravitational lensing: strong — methods: data analysis — methods: 
statistical 


1 INTRODUCTION 

Galaxy-galaxy gravitational lensing is a powerful astrophys- 
ical tool for studying the distribution of matter, including 
dark matter, in galaxies ( Treu|[20T0 ). One promising appli¬ 
cation of lensing is to study and measure the properties of 


dark matter substructures in the lens galaxy (Koopmans 
[2005 ). In recent years, this promise has been realised in 
several lens systems which are thought to contain at least 
one dark substructure (Vegetti et al. 2012 2010 [Vegetti, 
[Czoske, Koopmans|[20T0 ). In these studies, the lens was 
modelled as a superposition of a smooth component (such 
as an elliptical power-law matter distribution) plus a pixel- 
lized “non-parametric” correction term. The estimated spa¬ 
tial structure of the correction term provides clues about the 
locations of possible substructures. In a second modelling 
step, the lens is assumed to be a smooth overall component 
plus a compact “blob” near any locations suggested by the 
correction term. This kind of inference has also been done 


with point-like multiple images of quasars (Fadely & Keeton 


2012 ). 


* To whom correspondence should be addressed. Email: 
bj.brewerOauckland.ac.nz 


In this paper, we introduce an approach for solving this 
same problem (inferring the number and properties of dark 
substructures given an image) in a more direct fashion, by 
fitting a model with an unknown number of substructures 
directly to the image data. However, with appropriate mod¬ 
ifications, it may also prove useful in other galaxy-galaxy 
lens modelling areas. One example is time-delay cosmog¬ 
raphy, which requires realistically flexible lens models, and 
reliable quantification of the uncertainties ( Suyu et al.|2013 


MIllGrillo et al.|2015 l 


Lens modelling has been a topic of much interest over 
the last two decades. Most approaches are based on Bayesian 
inference ( Sivia fc Skilling|2QQ6 O’Hagan and Forster|20Q4 ), 


maximum likelihood (Millar 2011) or variations thereof. 


These approaches may be categorized according to the fol¬ 
lowing criteria: 

(i) Whether to use a simply parameterized (e.g. a Sersic 


profile), moderately flexible (e.g. a mixture model. Brewer et 
al.|2011 ) or free-form (e.g. pixellated) model for the surface 
brightness profile of the source; 

(ii) Whether to use a simply parameterized (e.g. Singular 
Isothermal Ellipsoid [SIE]) or flexible model (e.g. pixellated) 
for the mass profile of the lens; and 

(iii) How to compute the results (e.g. optimization meth- 
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ods, Markov Chain Monte Carlo [MCMC] with source pa¬ 
rameters marginalized out analytically, or MCMC with the 
source parameters included). 

Recent sophisicated approaches investigating different prior 


If we are interested in detecting and measuring the 
properties of possible dark substructures in a lens galaxy, 
a superposition of an unknown number of “blobs” is the 
most natural model. In addition, if we want to constrain the 


assumptions and computational approaches include |Coles, 

Vegetti et al.|2014), we will need a hierarchical model which 

Read, & Saha (2014| 

), Tagore & Jackson (2015), and Birrer, 

specifies the prior probability distribution for the masses 

Amara, & Refregier| (2015). 

conditional on some hyperparameters. Inferring the mass 


Each approach involves tradeoffs between convenience 
and realism. Simply parameterized models, such as Sersic 
surface brightness profiles for the source, and singular 
isothermal ellipsoid plus external shear (SIE+ 7 ) models for 
the lens ( Kormann, Schneider, Bartelmann|1994| , are very 
convenient. They only have a few adjustable parameters, and 
they capture (to “first order”) relevant prior information 
that we have about the source and lens profiles. Of course, 
they are clearly simplifications, and can produce misleading 
results if the actual profile is very different from any member 
of the assumed family. 

On the other hand, pixellated models for the source (e.g. 
Suyu et al.|20Q6 ) or the lens (e.g. Coles, Read, fc Saha|2014| ) 
can in principle represent “any” source surface brightness 
profile or lens projected density profile. However, the prior 
distribution over pixel values is often chosen to be a multi¬ 
variate gaussian for mathematical reasons, so that the source 
can be analytically marginalized out ( Warren Dye|2003 ). 
Unfortunately, a multivariate gaussian prior over pixel inten¬ 
sities usually corresponds to a poor model of our prior beliefs 
about the source. It assigns virtually zero prior probability 
to the hypothesis that the source actually looks like a galaxy, 
and very high prior probability to the hypothesis that the 
source looks like noise (or blurred noise). These priors also 
assign nonzero probability to negative surface brightness or 
density values; in fact, the marginal prior probability that 
any pixel is negative is typically 0.5. 


Brewer et al. (2011) argued that an ideal modelling ap¬ 


proach lies somewhere between simply-parameterised and 
pixellated models, which is also a motivation behind the in¬ 


vestigation of shapelets by Tagore & Jackson (2015). One 
way of achieving this is with mixture models. The source 
and the lens can be built up from a mixture of a moderate 
number (from a few to a few hundred) of simply param- 
eterised components. Eor the source, this allows us to in¬ 
corporate prior knowledge about the local correlations (the 
surface brightness at any particular point is likely to be sim¬ 
ilar to that at a nearby point) and the fact that most of the 
sky is dark ( Brewer Lewis|2006 ). The Brewer et al. (2011) 
model also allowed for multi-band data, and allowed for our 
expectation that the image may or may not be similar in 
different bands. 

The present paper is similar in approach to |Brewer| 


et al. (2011). The main differences are: i) we use a sim¬ 
pler and faster set of basis functions; ii) we apply it to 
the lens as well as the source, allowing for the possibil¬ 
ity of substructure; and iii) the implementation is based 


on a C++ template library developed by Brewer (2014) 


which allows for hierarchical priors and trans-dimensional 
MCMC to be implemented in a relatively straightforward 
manner. However, in the present paper we do not account 
for multi-band data; this will be reserved for a future con¬ 
tribution. The source code for this project is available at 
http://www.github.com/eggplantbren/Lensing2 


function of the blobs then reduces to calculating the poste¬ 
rior distribution for the hyperparameters. This is related to 
the general principle that we should (ideally) construct our 
inference methods so that the quantities we infer directly an¬ 


swer our scientific questions (Alsing et al. 2015 Schneider 


et al.|[20T5 Pancoast, Brewer, & Treu 2011). Here, we are 

referring to the mass function of the substructures within a 
single lens galaxy. An additional level of hierarchy would be 
needed to answer questions about a sample of lens galaxies. 


2 THE MODEL 

We now describe the details of our model and its parame¬ 
terization. The motivation for most of our modelling choices 
is a compromise between computational efficiency, realism, 
and ease of implementation. None of the choices we have 
made are final in any sense; rather, this model should be 
considered a proof of concept. We encourage exploration of 
other choices. Since we can compute marginal likelihoods, 
Bayesian model comparison between our choices here and 
any proposed alternatives should be straightforward, if a 
union of our hypothesis space and another can be consid¬ 
ered a reasonable model of prior uncertainty. 


2.1 The Source 

The surface brightness profile of the source is assumed to be 
composed of a sum of a finite number of “blobs”, or basis 
functions, in order to allow some flexibility while incorpo¬ 
rating prior information about the non-negativity of surface 
brightness, and the spatial correlation expected in real sur¬ 
face brightness profiles. Eor computational speed, we choose 
the following functional form for the surface brightness pro¬ 
file of a single blob centered at the origin: 


f{x,y) = 


2A 

0 , 


r) , < w 

otherwise 


( 1 ) 


where r = w is the width of the blob, and A 

is the total flux of the blob (i.e. the integral of the surface 
brightness over the entire domain). These basis functions are 
inverted paraboloids, which are faster to evaluate than gaus- 
sians, since they do not contain an exponential function. In 
addition, the finite support means that each blob will evalu¬ 
ate to zero over a large fraction of the domain which confers 
an additional speed advantage. More conventional choices 
such as gaussians are possible, as are alternative compact 
basis functions, perhaps inspired by the smoothed particle 


hydrodynamics literature (Dehnen & Aly 2012 ). Alterna¬ 


tive choices, such as Sersic profiles, may be more realistic 
despite their computational cost. This is especially likely 


in the case of an early-type source galaxy (e.g. Auger et 
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^| 2011 ). Such modifications are possible without great ef¬ 


fort using the RJObject library, and once implemented, the 
marginal likelihood can be compared across different choices 
for the source model. 

If our model contains Nsrc such blobs, positioned at 
with widths {wi} and total fluxes {Ai}, the 
overall surface brightness profile is: 


Nsrc ( 

2Ai f 


Ti ^ Wi 


fix,y) = < 

TTTO? V 

1 

( 2 ) 

i = l [ 

, 0, 


otherwise 


where {x — + (y — Under these assump¬ 

tions, the source can be described in its entirety by the fol¬ 
lowing parameters: 

{iVsrc,{0r}ti} (3) 

where denotes the parameters for source blob z: 




(4) 


The dimensionality of the parameter space for the source de¬ 
pends on the minimum and maximum values of A^src, which 
we set to 0 and 100 respectively (more detail about priors 
is given in Section]^. Therefore the source is described by 
1-401 parameters. 


2.2 The Lens 

The surface mass density profile of the lens is modelled as a 
superposition of a singular isothermal ellipsoid plus external 
shear (SIE+ 7 ) and N circular lens “blobs”. The SIE +7 is a 
simple and widely used lens model with analytically avail¬ 
able deflection angles, and is intended to account for the bulk 
of the lensing effect due to the smooth spatial distribution of 
(visible and dark) matter in the lens galaxy. Unfortunately, 
along with simplicity comes a lack of realism, and generalis¬ 
ing the smooth lens model is an important future step. The 
softened power law elliptical mass density or SPEMD model 
is a popular generalization of the SIE where the deflection 
angles can still be computed quickly using the approxima¬ 


tions given by Barkana (1998). 


Ultimately, concentric mixtures of smooth components 
may be the most useful for realistic inference of the den¬ 
sity profile of the halo. A common approach in the lensing 
community, that is similar in spirit, is a superposition of an 
elliptical power law, external shear, and a pixellated poten¬ 
tial correction. However, the direct use of blobs is closer to 
the scientific question at hand when investigating postential 
dark substructures. 

The SIE +7 has nine free parameters: the (circularized) 
Einstein radius b, axis ratio q, central position (xc,yc), ori¬ 
entation angle 0, the external shear 7 , and the orientation 
angle of the external shear, 6^. 

The N lens blobs are intended to model possible dark 
or faint substructures in the projected mass profile of the 
lens. A lensing blob with mass M and width v centered at 
the origin has the following surface mass density profile: 


pix,y) = 


2M 

0 , 


(l-s) 


r ^ V 
otherwise 


(5) 


which is the same as the surface brightness profile of a source 
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blob. The deflection angles for a single blob are: 
( 2 -rVu 2 ) 


ax{x,y) = 
ayix.y) = 


Mx 
irr- 

( 2 -■'"/«“) 3 

My 

Trr^ ’ 


Mx 

r ^ V 
otherwise 

(6) 

My 

TTV^ ’ 

r ^ V 
otherwise 

(7) 


where r = yjx‘^ + 2 /^- These do not depend on any compu¬ 
tationally expensive functions such as square roots or expo¬ 
nentials. Eor N such blobs the deflection angles are summed 
over all blobs. Therefore the lens can be entirely described 
by the following parameters: 


l^SIE, iVlens, 

where denotes the parameters for lens blob i\ 


^lens 


{ 


lens lens 71 /r 
Xi ,yi ,Mi 




( 8 ) 


(9) 


and 0SIE describes the parameters for the SIE +7 compo¬ 
nent: 


^siE = . 


( 10 ) 


Since the model is not of fixed dimension (the num¬ 
ber of source and lens “blobs” is unknown), we used 
a trans-dimensional MCMC sampler based on reversible 
jump MCMC (Green 1995). This framework has been 


used successfully in many astronomical inference problems 
(e.g. Jones, Kashyap, & van Dyk 2014 Umstatter et al. 


2005). We implemented the MCMC using the RJObject 


software (Brewer 2014), a C++ library for implementing 


trans-dimensional MCMC with hierarchically-specified pri¬ 
ors when the model is a “mixture model” or similar, as is the 
case here. The RJObject library uses the Diffusive Nested 
Sampling algorithm (DNS; Brewer, Partay, &: Csanyi| 2011 ) 
for its sampling, but the trans-dimensionality is handled by 
the MCMC moves. Therefore, the marginal likelihood we ob¬ 
tain is one that involves a sum over the hypothesis space for 
Nsrc and Aliens- In other words, we do not need separate runs 
with different trial values of Nsrc and Wens- Previous astro¬ 
nomical applications of RJObject include [Huppenkothen e^ 
al. (2015) and [Brewer Donovan (2015). Proposal moves 
for the parameters (positions and masses of the source and 
lens blobs), and birth/death moves to add or remove blobs 
to the source or lens, are handled internally by RJObject and 
fit into the general framework described in Brewer (2014). 


3 PRIOR PROBABILITY DISTRIBUTIONS 

The prior probability distributions for all hyperparameters, 
parameters, and the data, are given in Table ^ and a fac¬ 
torization of the joint distribution is displayed in Eigure[^ 
With these priors, we aim to express a large degree of prior 
uncertainty (hence the liberal use of uniform, log-uniform, 
and Cauchy distributions). However, we also specify some 
priors hierarchically to allow (for example) blobs to clus¬ 
ter around a typical central position, rather than implying a 
high probability for substructure positions being spread uni¬ 
formly (in a frequency sense) over the sky. We also applied 
hierarchical priors to the substructure masses (and source 
fluxes) so that, while the typical order of magnitude of the 
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masses is unknown, the masses themselves are likely to be 
roughly the same order of magnitude. The hyperparameters 
of these distributions can also be interpreted as straight¬ 
forward answers to questions about the substructure mass 
function. 

We assign somewhat informative (but heavy-tailed) 
Cauchy priors to the central positions of the source and the 
lens. One might object to the Cauchy priors for the central 
positions (and argue instead for gaussians) on the basis that 
they do not have rotational symmetry. However, the priors 
we are assigning are prior only to the values of the data pixel 
intensities only, and not other facts about the data, such as 
its dimensions in arc seconds, or its rectangular shape. Given 
these, there is no reason to insist on rotational symmetry. 
The Cauchy priors are intended to enhance the plausibility 
(relative to what a uniform prior would imply) that the lens 
and source are somewhere near the centre of the image, but 
in a cautious way. 

The “circular” conditional prior for the blob positions 
has the following density: 

p{x,y)dxdy = ^Jexp(-:^) dxdj/ (11) 

where r — y/{x — XcY {v ~ VcY■ This results from an ex¬ 
ponential distribution (with expectation W) for the radial 
coordinate and a uniform distribution for the angular coor¬ 
dinate in plane polar coordinates. This was chosen (as op¬ 
posed to a more “obvious” choice such as a normal distribu¬ 
tion) for computational reasons: the RJObject code requires 
a function to transform from Uniform(0, 1) distributions to 
this distribution and back (making use of cumulative distri¬ 
bution functions and their inverses), and the normal would 
have required the use of special functions for this. 


3.1 Conditional prior for the data 

The conditional prior for the data given the parameters is a 
product of independent gaussian distributions, one for each 
pixel. The mean of the gaussian is given by the “mock” 
noise-free image we would expect based on the parameters. 
The standard deviation of the gaussian is a combination of 
three terms; the first (denoted Sij) is a “noise map” which 
is loaded from a file, the second is an unknown constant cro 
which applies to the whole image, and the third is propor¬ 
tional to the square root of the mock image, with propor¬ 
tionality constant cri. 

This is usually called the “sampling distribution”, or 
sometimes just the “likelihood”. However, sampling distri¬ 
bution is misleading since no physical frequency distribution 
exists which is being sampled from. The term likelihood is 
also usually used to refer to the scalar function of the param¬ 
eters obtained when the data are known. For a discussion of 
this object is really a prior distribution, see 
, pp. 33-35). 

It is common to provide a “variance image” to lens mod¬ 
elling software, which specifies the standard deviations used 
in the likelihood function. However, telescopes do not pro¬ 
vide variance images. It also does not make sense to specify 
vague priors prior to the image pixel values but posterior 
to the variance image (which usually resembles the image 
itself, and therefore would be highly informative). Allowing 
the standard deviation of the gaussian in the likelihood to 


the view that 


Caticha (2008 


depend on the mock image brightness is a more principled 
way of modelling our actual inferential situation. Neverthe¬ 
less, we allow for an input variance image as well, which can 
be used for ad-hoc masking of troublesome regions (such as 
unmodelled flux from other sources). 


4 COMPUTATION 


Computing the posterior distribution over the parameters of 
such a model requires that we can implement Markov Chain 
Monte Carlo (MCMC) over the space of possible sources 
and lenses. To compute the posterior distribution for the 
parameters (of which there are 24-464, depending on the 
values of and Mens), we use Diffusive Nested Sampling 
( Brewer, Partay, Csanyi|2011 ), a form of Nested Sampling 
( Skilling||2006 ) that uses the Metropolis algorithm to move 
around the parameter space. 

The proposal distributions for the blob parameters 
(both source and lens) are handled by the RJObject library 
( Brewer|2014 ). This includes birth and death proposals that 
increase or decrease either Mrc, or Mens, as well as propos¬ 
als that move the blobs (in their parameter space) while 
keeping the number of blobs fixed. RJObject also facilitates 
proposals that change the hyperparameters (either for the 
lens or the source) while keeping the actual blobs in place, 
as well as proposals that change the hyperparameters and 
shift all of the relevant blobs in so they are appropriate for 
the new values of the hyperparameters. 

Evaluating the likelihood function requires that we com¬ 
pute a “mock” image from the current setting of the param¬ 
eters. This mock image is calculated using standard ray¬ 
tracing methods with a uniform grid of n x n rays fired per 
image pixel. However, certain kinds of proposals do not af¬ 
fect the image in any way, such as those which change the 
noise parameters. For efficiency we do not recompute the 
mock image in these cases. 


5 “EASY” SIMULATED DATA 

To demonstrate the method, we generated a simulated 
dataset where the lens was an SIE +7 profile with a sin¬ 
gle additional substructure. The image is shown in Figure 
and consists of 100 x 100 pixels. The point spread func¬ 
tion (PSF) was compact and was defined on a 5 x 5 grid. 
The data was created by firing only one ray per pixel, and 
the inference was also carried out under the same level of 
approximation. Since the model assumptions are all correct 
for this image, the demonstration here is purely to illustrate 
the computational tract ability of the problem, and the kind 
of outputs the method can produce. The SIE +7 parameters 
were, to three significant figures, {5, g, 7 , ^ 7 } = 

{5.25,0.323,0.191,-0.341,0.833,0.0322,0.497}, where the 
length units in the image plane are arbitrary, and rotation 
angles are measured in radians. The single substructure is 
located at (—2.57,2.20), about halfway between the center 
and the upper left image. Its mass is 2.50 units, using the 
convention where the critical density is 1. For comparison, 
the SIE mass within its critical ellipse is 7 r 5 ^ = 86.6 units. 

We executed the code to generate 5,000 samples from 
the “mixture of constrained priors” distribution of DNS. Our 
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Quantity 

Meaning 

Prior 

Numbers of Blobs 

-^src 

Number of source blobs 

Uniform(0,1,..., 100) 

-^lens 

Number of lens blobs 

Uniform(0,1,..., 10) 

Source hyperparameters (c^src) 

{^r,yr) 

Typical position of source blobs 

iid Cauchy (location^ (0, 0), scale=0.1x imageWidth) 

Rsrc 

Typical distance of blobs from {x^J^, V^c^) 

LogUniform(0.01 X imageWidth, 10 X imageWidth) 

P'STC 

Typical flux of source blobs 

In(Aisrc) -- Cauchy(0, l)r(-21.205, 21.205) 

Wsrc 

max 

Maximum width of source blobs 

LogUniform(0.001 X imageWidth, imageWidth) 

mm 

Minimum width of source blobs 

Uniform(0, 

Lens hyperparameters (oiiens) 


Typical position of lens blobs 

iid Cauchy (location^ (0, 0), scale=0.1x imageWidth) 

l^lens 

Typical distance of blobs from 

LogUniform(0.01 X imageWidth, 10 X imageWidth) 

Miens 

Typical mass of lens blobs 

In(Mlens) Cauchy(0, 1)T(-21.205, 21.205) 

Wlens 

max 

Maximum width of lens blobs 

LogUniform(0.001 X imageWidth, imageWidth) 

Wlens 

mm 

Minimum width of lens blobs 

Uniform(0, 

Source Blob Parameters (0^^) 


Blob position 

Circular(location=(a:®’^‘^, scale=Rsrc) 

Ai 

Blob flux 

Exponential (mean=/isrc) 

Wi 

Blob width 


Lens Blob Parameters (^J®’^®) 

(3.1ens_ j^lens) 

Blob position 

Circular(location=(a;Jf’^®, yjf’^®), scale=Riens) 

Mi 

Blob mass 

Exponential (mean=y,iens) 

Vi 

Blob width 


Smooth Lens Parameters (^sie) 

b 

SIE Einstein Radius 

LogUniform(0.001 X imageWidth, imageWidth) 

q 

Axis ratio 

Uniform(0, 0.95) 


Central position 

iid Cauchy(location= (0, 0), scale=0.1x imageWidth) 

0 

Orientation angle 

Uniform(0, tt) 

7 

External shear 

Cauchy(0, 0.05)T(0, oo) 


External shear angle 

Uniform(0, tt) 

Noise Parameters (cr) 

CTO 

Constant component of noise variance 

In(cro) ~ Cauchy(0, 1)T(-21.205, 21.205) 

cri 

Coefficient for variance increasing with flux 

ln7l) -- Cauchy7 l7721.205, 21.205) 

Data (L>) 

Dij 

Pixel intensities 

Normal(mij, + erg + (Jimij) 


Table 1. The prior distribution for all hyperparameters, parameters, and the data, in our model. Uniform(a,, b) is a uniform distribution 
between a and b. LogUniform(a, b) is a log-uniform distribution (with density f{x) oc 1/x, sometimes erroneously called a Jeffreys prior) 
between a and b. The notation T{a,(3) after a distribution denotes truncation to the interval [o:,^]. The constant imageWidth is the 
geometric mean of the image dimensions in the x and y directions. 


thinning factor was 10^, so 5x 10® MCMC iterations were ac¬ 
tually performed, taking approximately 48 hours on a mod¬ 
est desktop pcQ After resampling these samples to reflect 
the posterior distribution, we were left with 500 (equally 
weighted) posterior samples. 

The posterior distributions for complex models, such as 
the mixture models used here, are often challenging and un¬ 
intuitive to summarise. One effective way to visualise the 


^ The computer was purchased in 2012 and has a second- 
generation intel i7 processor, and the process was run on 8 
threads. 


uncertainty in the inferences is to play a movie where each 
frame is a sample from the posterior distribution. The de¬ 
gree to which the frames differ from each other conveys the 
uncertainty remaining after taking the data into account. 


For the purposes of a paper, static summaries are more 
convenient than movies. One useful summary is based on 
the concept of empirical measure. The empirical measure of 
the substructure positions is a function that takes the actual 
substructure positions and produces a “density 

function” over two dimensions, composed of delta functions 
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6 Brewer, Huijser and Lewis 



Figure 1. A probabilistic graphical model (PGM) of the dependence structure of the prior information, produced using DAFT 
(www.daft-pgm.org). The prior for the source and lens blob parameters are specified conditional on hyperparameters. The purpose 
of this is to induce dependence in the prior distribution for the blob parameters, implying (for example) that the mass of one blob is 
slightly informative about the mass of another, and that the locations might be clustered around a certain typical location. 


Simulated Dataset 



Figure 2. A simulated image of a simple “galaxy” lensed by an 
SIE +7 lens close to the center of the image, indicated by the 
white circle, plus a single substructure close to the top left image 
(indicated by a star symbol). The image was blurred by a PSF 
and had some noise added. In an arbitrary system of units, this 
image extends from -8 to 8 in the x and y directions. 


at the positions themselves: 

AT- -^lens 

r /lens lensM V (5^ V - (12) 

Iy^ )j. ^ ' / j ^ Iy y^ y • 


Intuitively, the empirical measure is a mathematical ob¬ 
ject that is like an “infinite resolution histogram”, in this 
case a two dimensional histogram, of the substructure posi¬ 
tions. Being a function of the actual substructure positions, 
the empirical measure is not available to us since we do not 
know those positions with certainty. However, we have sam¬ 
ples from the posterior distribution for those positions, and 
can use these (trivially) to create samples from the posterior 
distribution for the empirical measure. We can also sum¬ 
marise this posterior, for example, by taking its expected 
value. 

The posterior expected value of the empirical measure 
is: 

/ -^lens 

p{e\D) y - do ( 13 ) 

i=i 

where 0 denotes all parameters and hyperparameters (in¬ 
cluding the Ns) and “/ is an integral and summation 
over the entire parameter space. 

Since we can approximate posterior expectations using 
Monte Carlo, we can obtain the expected value of the em¬ 
pirical measure using: 

1 ^ ^lens 

- X] ^ <5^ (a: - a:)!"®, y - yl|"®) (14) 

^ k=l i=l 

where n is the number of posterior samples. The resulting 
function is an “image” with a point mass wherever a sub¬ 
structure occurred. For visualisation purposes the image can 
be blurred, or calculated at a lower resolution by replacing 
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Figure 3. The joint posterior distribution for the SIE mass (in¬ 
tegrated within its critical ellipse, which is not the critical curve 
of the lens overall), and the total mass in substructures. 


the Dirac-delta function with a discrete version which re¬ 
turns a nonzero constant if a substructure appears in a pixel 
or zero otherwise. 

The masses of the smooth and substructure components 
of the lens are usually of interest. Since the total mass of 
an SIE lens is infinite, the question needs to be redefined, 
so we ask about the mass within some aperture of finite 
area. For an SIE, the mass (in dimensionless units based on 
the critical density) within the critical ellipse is simply . 
However, the blobs have finite total masses {Mi}. To obtain 
the posterior distribution for the lens masses, one must be 
clear about exactly which mass they are talking about, and 
many definitions are possible, although some might be more 
meaningful or well constrained than others. 

In the present paper we do not address the question 
of exactly which quantities related to the density profile of 
the lens are most scientifically interesting. Nevertheless, we 
can verify that the results of the inference do behave in un¬ 
derstandable ways. For example, in Figure we plot the 
joint posterior distribution for the SIE mass within its crit¬ 
ical ellipse and the total substructure mass over the entire 
domain. As one would expect, there is a strong dependence 
between these two quantities in the posterior distribution, as 
mass in the SIE can be traded off with mass in substructures 
to some extent, while the model remains (loosely speaking) 
“consistent with the data”. 

The posterior distribution for Mens is also clearly of in¬ 
terest, and is displayed in Figure The prior for this param¬ 
eter was uniform from 0 to 10 (inclusive), and the possibility 
iV = 0 has been (loosely speaking) “ruled out” by the image 
data. The true solution {N = 1) has the highest probability. 
However, the possibilities with N > 1 are still fairly plausi¬ 
ble, since it’s possible that a very small substructure exists, 
and it’s also possible that two substructures near each other 
could mimic the effect of one. The degree to which these 
possibilities are plausible is related to the choice of prior for 
the blob amplitudes and positions. 

The estimated marginal likelihood of our model (av¬ 
eraged over all parameters including Mens and Mrc) is 
In [^(D)] —14141.4, and the information (Kullback- 

Leibler divergence from the prior to the posterior) is ~ 
99.4 nats. The information represents the degree of com¬ 


Figure 4. The marginal posterior distribution for Mens? fhe num¬ 
ber of substructures in the lens. The prior was uniform and the 
true value used to generate the data was 1. However, the data 
is only informative enough to suppress the probability of values 
above 1 slightly, since it is possible (given this data) that low mass 
substructures might exist somewhere, or that what we think is one 
substructure might actually be two close together, and other such 
possibilities. 


pression of the posterior distribution with respect to the 
prior, and can be interpreted quite literally as how much 
was learned about the parameters from the data. It is also 


straightforward to estimate from Nested Sampling ( [Skilling 
2006| ). Its definition is: 

p{e\D 


H = 


fm 


D)\n 


[ P{0) J 


dO 


(15) 


and we can build intuition about its meaning based on some 
simple examples. One such example is a uniform prior over 
a volume Vb and a posterior which is uniform over a smaller 
volume Vi contained within Vb- In this case V, = ln(Vb/Vi) 
nats. Therefore, a value oi H = 100 nats implies the poste¬ 
rior distribution occupies roughly of the prior volume. 

For the sampled parameter values representing the pos¬ 
terior distribution, there was no structure in the residuals 
(not shown), when normalised by the noise standard de¬ 
viation in each pixel. This is unsurprising since the model 
assumptions are entirely correct for this dataset. 


6 “HARDER” SIMULATED DATA 

To further test the algorithm, we created a “harder” sim¬ 
ulated image, where the lens consisted of an SIE +7 profile 
plus 10 additional substructures. The image is shown in Fig¬ 
ure With the easy dataset, there was little hope of mea¬ 
suring the substructure mass function parameter fiiens since 
there was only a single substructure which would provide 
little information about /xiens (only constraining its general 
order of magnitude), even if its mass were measured per¬ 
fectly. 

As with the ‘easy’ dataset, we generated 5,000 sam¬ 
ples from the DNS target distribution and thinned by a 
factor of 10^, so 5 x 10® MCMC iterations were actually 
performed, taking approximately 60 hours. The slower run¬ 
time in this case was because Mens = 10, more CPU 
time was spent exploring parts of the hypothesis space 
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Substructure Positions 



J 


Figure 5. The ‘easy’ simulated data, with substructure positions 
overlaid (a Monte Carlo approximation to the posterior expecta¬ 
tion of the empirical measure). The density of black points in any 
region is proportional to the expected number of substructures 
whose centers lie within that region. In this case, there is strong 
evidence for a substructure close to the top-left image (where one 
was actually placed). 


Simulated Dataset 



Figure 6. The ‘harder’ simulated data, consisting of a more com¬ 
plex source profile, and a lens with an SIE +7 (whose centre is 
indicated by a white circle) and ten substructures. Only five of 
the substructures (star symbols) occured within the square image 
boundary, but the other five still have an effect on the image. 


where Mens is high. The run resulted in 557 equally- 
weighted posterior samples. The posterior distribution for 
the SIE mass and the total substructure mass is given 
in Figure The samples can be viewed as a movie at 
WWW. youtube.com/watch?v=o3ppfKSk248. 

To demonstrate the claim that we can infer something 
about the mass function of substructures directly from the 
image data, we have plotted the posterior distribution for 
Miens, the hyperparameter for the substructure masses, in 
Figure The true value was 1, and the posterior distribu¬ 
tion indicates a fairly wide uncertainty range that is at least 



Figure 7. Same as Figure but for the ‘harder’ simulated 
dataset. 



Figure 8. The posterior distribution for yuiens, the hyperparam¬ 
eter which determines the prior expected value of substructure 
masses, given the ‘harder’ simulated data. The true value (in ar¬ 
bitrary units) was 1 , which is indicated by the vertical line. 


consistent with the true value in some sense. Given that 
the image only contained a few substructures whose masses 
could be measured with any accuracy, the large uncertainty 
is not surprising. 

The estimated marginal likelihood of our model, for the 
‘harder’ simulated data, is \n\p{D)] ^ —5671.9, and the 
information (Kullback-Leibler divergence from the prior to 
the posterior) is H ^ 155.7 nats. As with the easy dataset, 
the standardised residuals of the sampled models resembled 
noise. 


7 THE COSMIC HORSESHOE 


As a further demonstration the method, we apply it to the g- 


band data of the Cosmic Horseshoe J1004+4112 (Belokurov 
Dye et ar]|2008 ) taken with the Isaac Newton 


et al. 


2007 


Telescope (INT). The image is shown in Figure pT] For this 
system, more data is available (i and U band data, as well 
as more recent Hubble Space Telescope (HST) imaging), al¬ 
though the g-band image is the highest signal-to-noise of 
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Figure 9. Same as Figure but for the ‘harder’ simulated 
dataset. The true number of substructures was 10, which also 
happens to be the posterior mode. Only five substructures were 
located within the domain of the image, but Aliens = 10 is prob¬ 
able because the spread-out positions of the substructures imply 
that i^iens is probably large. 


Substructure Positions 



Figure 10. Same as Figure but for the ‘harder’ simulated 
dataset. For all of the substructures located close to the ring, 
the positions were well constrained. There was weak evidence for 
some substructures that did not in fact exist. 


the INT images. The source redshift is Zs = 2.379, and 
the lens is a massive luminous red galaxy at zi — 0.4457. 
Unfortunately our current implementation doesn’t allow for 


multi-band data (unlike Brewer et al. (2011)), and is quite 
slow when running on the larger HST image. Therefore this 
section should be considered a further demonstration of the 
technique, and not a thorough study of this system. 

We generated 5,000 samples from the DNS target dis¬ 
tribution and thinned by a factor of 10^, so 5 x 10® MCMC 
iterations were actually performed, taking approximately 
40 hours. After resampling, this resulted in 718 equally 
weighted posterior samples. Although this image is smaller 
than the simulated data (46 x 46 pixels), we fired 2x2 rays 
per pixel for greater accuracy. 

The joint posterior distribution for the SIE mass total 
substructure mass for the Cosmic Horseshoe is given in Fig- 
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Figure 11. The g-band INT image of the Cosmic Horseshoe, 
with the lens galaxy subtracted. 
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As with the simulated data (Figure]^, 


we see an 

expected negative correlation between these two quantities. 
However, the points are not as smoothly distributed in this 
case (we discuss this issue further in Section |7.1[ ) . An upper 
“limit” on the SIE mass, with 95% posterior probability, is 
5.20 X 10^^ solar masses. The algorithm has also found some 
possibilities where the substructure mass is much greater 
than this. In these cases, the substructures are far from the 
image — it is the environment that is being modelled. 

The posterior distribution for Mens (Figure 13) also 
shows some evidence for more than zero substructures. In 
particular, the prior probability for Mens > 0 was 10/11 
~ 0.91, whereas the posterior probability is approximately 
0.997. This corresponds to a Bayes Factor of about 300 in 
favour of Mens > 0 vcrsus the alternative Mens = 0. 

Despite weak evidence for the existence of substruc¬ 
ture, when we examine the expected value of the empirical 
measure of substructure positions (Figure |14[ ) we find no 
consistency in their positions, unlike for the simulated data 
(Figure!^. This sit uation is not uncommon. For example, 
[Brewer Donov^ (2015) found evidence for a large number 
of Keplerian signals in a time series, but the number of such 
signals with well constrained properties was much lower. Re¬ 
lated to this, we can compute posterior probabilities for any 
hypothesis about the substructure masses. With 95% prob¬ 
ability the substructure mass is less than 1.53 x 10^® solar 
masses and with 75% probability it is less than 5.10x10^^ 
solar masses. As the Einstein ring itself implies a mass of 
about 5 X 10^^ solar masses, these summaries are affected 
by the possibility of blobs outside the main Einstein ring. 

As with any inference, the results presented here may 
be sensitive to many of the input modelling assumptions, 
and a slightly different (yet still reasonable) set of choices 
might yield different results. For example, if the density pro¬ 
file of the lens was smooth but not of the SIE form, and 
the data were informative enough to show this, the current 
model would only be able to explain the data by adding 
substructures. Hence, an alternative explanation for these 
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Figure 12. Same as Figure]^ but for the Cosmic Horseshoe. The 
joint posterior distribution for the SIE mass (integrated within its 
critical ellipse, which is not the critical curve of the lens overall), 
and the total mass in substructures. The units are defined by the 
critical density, so that a mass of tt units would have an Einstein 
radius of one arcsecond. To calculate the masses in solar masses, 
we assumed a fiat cosmology with Qrn = 0.3, = 0.7, and 

Hq = 70 km/s/Mpc. 



Figure 13. Same as Figure^ but for the Cosmic Horseshoe. The 
posterior distribution for the number of lens substructures in the 
Cosmic Horseshoe system. There is moderate evidence in favour 
of the hypothesis that Aliens / 0. 


results is that, rather than containing a substructure, the 
density profile of the lens is simply not within the SIE +7 
family. In fact, some of the posterior samples obtained con¬ 
tain massive substructures outside of the image, which could 
be modelling higher-order effects of the environment beyond 
what is captured by the simple external shear model. This 
is one way of violating the SIE +7 assumption, and another 
is simply to have a different projected density profile. One 
way of further investigating this is to use a different family 
of smooth lens models and doing model selection based on 
the marginal likelihoods. Another option which we defer to 
future work is to implement a more flexible model (such as 


Substructure Positions 



Figure 14. Same as Figure[^ but for the Cosmic Horseshoe. The 
positions of substructures encountered in the MCMC sampling for 
the Cosmic Horseshoe; technically a Monte Carlo approximation 
to the posterior expected value of the empirical measure of sub¬ 
structure positions. Unlike Figure for the Cosmic Horseshoe 
there is little consistency in the positions of the substructures. 


a mixture of concentric elliptical lenses). This result is con¬ 
sistent with that of Dye et al. (2008), who found a slight 
preference for an elliptical power law profile over the SIE 
(which is a specific case of an elliptical power law). 

Another potential inadequacy of our model is the as¬ 
sumption that the PSF is known. In practice, the PSF is 
often estimated from the image of nearby star(s), or from 
theoretical knowledge of the telescope (especially in the case 
of space based imaging). However, if the PSF is misspecified, 
or in fact varies across the image, this could induce subtle 
effects in the imaging which our current model could only 
explain using substructure. 

The estimated marginal likelihood of our model is 
In \p(D)] ^ —6948.5, and the information (Kullback-Leibler 
divergence from the prior to the posterior) is 77 ~ 123.1 
nats. Three example models (lens and source) sampled from 
the posterior are shown in Figure The model was able 
to fit the data down to the noise level. 


7.1 Reproducibility of the results 

The “effective sample size” returned by DNS, which we have 
described as the number of posterior samples, takes into ac¬ 
count the fact DNS’s target distribution is not the poste¬ 
rior. However, it does not take autocorrelation into account, 
and thus can present an optimistic picture of the accuracy 
of any Monte Carlo approximations to posterior quantities. 
Most standard diagnostic techniques used in MCMC can be 
applied here. The simplest of these is a check of reproducibil¬ 
ity. If different runs yield substantially different results, the 
MCMC output should be treated with caution. For exam¬ 
ple, in Figure |12[ there is a correlation between the mass 
attributed to the SIE component and that attributed to 
substructures. However, the distribution also appears some¬ 
what “lumpy” or multimodal. This may not be a feature 
of the actual posterior distribution, but could arise due to 
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Source _Density in substructures 


A 


Image 





Figure 15. Three example sources, the density profile in substructures, and the corresponding lensed, blurred images, representative of 
the posterior distribution. The angular scales are 7.5 arcseconds for the sources and 15 arcseconds for the images. 


imperfect sampling. Whereas a standard Metropolis sampler 
would move around very slowly in the hypothesis space, DNS 
naturally spends a non-negligible fraction of the time sam¬ 
pling the prior. Therefore, a particle exploring the parameter 
space can “forget” its good-fitting position, move somewhere 
completely different, and find another good-fitting model in 
a different location. This is a natural feature of the algo¬ 
rithm (and is also present in related algorithms such as par¬ 
allel tempering). Despite this, the patchy nature of Figure 
suggests that posterior exploration is still challenging in this 
problem. 

The results for the simulated data (Figure]^ were less 
complex because of the definite existence of one substruc¬ 
ture. Its mass and position provide some information about 
the hyperparameters aiens, restricting the probability that 
high mass substructures exist far from the image. 


8 CONCLUSIONS 

We have developed a trans-dimensional Bayesian approach 
motivated directly by the question of whether dark substruc¬ 


tures exist in a lens galaxy, given image data. By making use 


of the Diffusive Nested Sampling algorithm (Brewer, Partay, 
fc Csanyi|2011 ) and the RJObject library ( Brewer|2014 ) 


outsource the difficulties associated with choosing Metropo¬ 
lis proposals for a hierarchical model of non-fixed dimension. 
The model allows for source and lens “blobs” to appear as 
needed to explain the data. The prior for the blobs’ prop¬ 
erties is specified hierarchically, to more realistically model 
sensible prior beliefs, and to tie the model parameters di¬ 
rectly to questions of scientific interest such as the mass 
function of substructures. 


As a proof of concept, we demonstrated the successful 
recovery of a single substructure from simulated data for 
which all of the model assumptions were true. We then ap¬ 
plied the method to an image of the Cosmic Horseshoe sys¬ 
tem and found moderate evidence in favour of the existence 
of a substructure, or at the very least, a departure from a 
simple SIE +7 lens profile. The main output of our method 
is a set of posterior samples, each representing a plausible 
scenario for the lens and the source given the data and the 
assumed prior information. These samples can be displayed 
as a movie, which is a very useful method for intuitively un- 
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derstanding the remaining uncertainty. Any posterior sum¬ 
maries of interest can be produced from the samples, but the 
interpretation of many of these summaries is not necessarily 
straightforward. We have suggested that the posterior ex¬ 
pectation of the empirical measure of substructure positions 
is one of the most helpful summaries. 

The model assumptions used in this paper are fairly 
simple. In future work we intend to generalize the model 
to multi-band data, and extend the smooth lens model be¬ 
yond the overly simplistic SIE +7 assumption. Applications 
to other systems are also forthcoming, as well as variants 
on the model using different conditional priors for the sub¬ 
structure properties given the hyperparameters. 
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